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Mutational analysis of large genes with complex genomic structures plays an important role in medical genetics. 
Technical limitations associated with current mutation screening protocols have placed increased ^phasison 
the development of new technologies to simplify these procedures. High-density arrays of >?O,0OO- 
olieonucleotide probes, 25 nucleotides in length, were designed to screen for all possible heterozygous germ-line 
mutations in the 9.17-kb coding region of the ATM gene. A strategy for rapidly developing multiexon PCR 
amplification protocols in DNA chip-based hybridization analysis was devised and implemented in preparing 
target for the 62 ATM coding exons. Improved algorithms for interpreting data from two-color experiments, 
where reference and test samples are cohybridized to the arrays, were developed. In a blinded study, 17 of 18 
distinct heterozygous and 8 of 8 distinct homozygous sequence variants in the assayed region were detected 
accurately along with five false-positive calls while scanning >200 Kb in 22 genomic DNA samples. Of eight 
heterozygous sequence changes found in more than one sample, six were detected in all cases. Five previously 
unreported sequence changes, not found by other mutational scanning methodologies on these same samples, 
were detected that led to either amino acid changes or premature truncation of the ATM protein. DNA 
chip-based assays should play a valuable role in high throughput sequence analysis of complex genes. 



The Human Genome Project will soon provide the 
scientific community with the complete sequence 
of all human genes. A major challenge for medical 
genetics will be in using this information to eluci- 
date genetic contributions to single gene and mul- 
tifactorial diseases. Exhaustive mutational screens 
of candidate genes, identified through linkage 
analysis and proposed function, must be under- 
taken and sequence changes must be correlated 
with disease states to address these problems. Many 
technical obstacles must be overcome to make this 
approach both economical and time efficient for 
analyzing large genes with complex genomic struc- 
tures. 

A prime example of a challenging system for 
mutational analysis is the ATM gene (Savitsky et al. 
1995, 1997), responsible for ataxia telangiectasia 
(AT). AT is an autosomal recessive disorder charac- 
terized by cerebellar and progressive neuromotor 
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degeneration, immune deficiency, and the appear- 
ance of dilated blood vessels in the eyes and face. 
Patients also manifest growth retardation, prema- 
ture aging of skin and hair, chromosomal instabil- 
ity, lymphoreticular malignancies, and acute sensi- 
tivity to ionizing radiation. The protein-coding re- 
gion of the A TM gene contains 91 68 bp in 62 coding 
exons spread over 146 kb of genomic DNA (Platzer 
et al. 1997). It is a member of a family of proteins 
containing carboxy-terminal regions with homol- 
ogy to the catalytic domain of phosphatidylinositol 
3-kinase (PI 3-kinase), which have been implicated 
in diverse activities such as telomere maintenance, 
cell-cycle arrest, and DNA repair (Platzer et al. 1997). 

The ATM gene displays a complex mutational 
spectrum. Thus far, >100 different somatic and 
germ-line mutations have been identified, the ma- 
jority of which cause premature protein truncation 
(Gilad et al. 1996b; Wright et al. 1997). Very few 
common alleles, most notably the 103 C -» T allele 
among North African Jews (Gilad et al. 1996a), have 
been found (Telatar et al. 1998). The large size and 
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genomic structure of the ATM gene greatly compli- 
cates the process of screening genomic DNA 
samples for all possible sequence variations. Single- 
strand conformation polymorphism (SSCP) assay 
(Vorechovsky et al. 1996; Stilgenbauer et al. 1997), 
restriction endonuclease fingerprinting (Gilad et al. 
1998a, b), heteroduplex analysis (Telatar et al. 1998), 
and protein truncation assay (Telatar et al. 1996) 
protocols have been used in mutational analysis. 
These begin with separate amplification of indi- 
vidual ATM exons from genomic DNA or (when 
possible) transcript by PCR or RT-PCR protocols, re- 
spectively. The above methods are not amenable for 
complete analysis of the entire ATM genomic cod- 
ing and splice junction sequences in a single reac- 
tion, and most require gel electrophoresis, which 
complicates scale-up and automation. The RT-PCR 
methods carry a further risk of missing mutations 
that result in unstable RNA. 

Significant evidence exists that heterozygous 
ATM carriers have an elevated lifetime risk Of devel- 
oping breast cancer (Swift et al. 1987; Athma et al. 
1996; Larson et al. 1998); however, there is disagree- 
ment about the magnitude of this effect (Easton 
1994; Bishop and Hopper 1997; Fitzgerald et al. 

1997) . This debate is especially relevant owing to 
the large frequency of ATM mutation carriers, esti- 
mated to comprise -1 .4% of the general population, 
who could account for up to 6.5% of all breast can- 
cer cases (Athma et al. 1996). Large-scale mutational 
analysis of carefully selected test and control groups 
will be necessary to resolve this controversy but cur- 
rent mutational analysis tools preclude such an un- 
dertaking. 

Hybridization-based methodologies for high 
throughput mutational analysis using high-density 
oligonucleotide arrays (DNA chips) have developed 
recently (Hacia et al. 1998a; Ramsey 1998). Light- 
directed combinatorial chemical synthesis ap- 
proaches enable the manufacture of high-density 
oligonucleotide arrays of >10 5 distinct species, typi- 
cally 25 nucleotides in length, on 1.2 x 1.2-cm 2 
glass surfaces (Fodor et al. 1991; McGall et al 1997). 
Oligonucleotide arrays have been used to screen for 
sequence variations in the CFTR gene (Cronin et al. 
1996), the human immunodeficiency virus-1 (H1V- 
1) reverse transcriptase and protease genes (Kozal et 
al. 1996), the 0-globin gene (Yershov et al. 1996), 
the mitochondrial genome (Chee et al. 1996), and 
the BRCA1 gene (Hacia et al. 1996, 1998a). Further- 
more, they have been used to identify and genotype 
single nucleotide polymorphisms (Wang et al. 

1998) , monitor gene expression (Lockhart et al. 
1996), analyze genetic screens (Shoemaker et al. 



1996), design antisense oligonucleotides (Milner et 
al. 1997), identify bacterial species (Gingeras et al. 
1998), and acquire information from orthologous 
genes in related species (Hacia et al. 1998b). 

In this study we describe the design of a high- 
density oligonucleotide array-based assay to screen 
the ATM gene for all possible sequence variations in 
the heterozygous state. This represents one of the 
most ambitious application of DNA chips to muta- 
tion detection yet described. A streamlined strategy 
to analyze efficiently all 62 ATM coding exons is 
described that should be applicable toward virtually 
any DNA chip-based mutation screen. Furthermore, 
improved algorithms for heterozygous mutation de- 
tection were developed to analyze the results of 
blinded studies conducted to determine the sensi- 
tivity and specificity of two-color DNA chip-based 
assays. 

RESULTS 

Oligonucleotide Array Design 

Extending previous oligonucleotide array-based 
mutational analysis of the 3.45-kb BRCA1 exon 11 
sequence (Hacia et al. 1996), a pair of DNA chips 
(interrogating sense and antisense strands) contain- 
ing >95,000 oligonucleotides were designed to de- 
tect all possible sequence variations in the ATM cod- 
ing region including the 3' GT donor and 5' AG 
acceptor splice junction sequences of each coding 
exon. Four 25-mer sequencing probes, substituted 
with one of the four nucleotides in the central po- 
sition, interrogate the identity of each target 
nucleotide (Fig. 1). For every perfect match probe 
(fully complementary to the target sequence) in 
each set of sequencing probes, another identical 
perfect match probe is present elsewhere in the ar- 

Designation Probe Sequence 

A, C, G, or T 

Substitution 5' ' 3" 

Complementary Base lo Wild Type Targel 

Perfect Match 5' ' 3' 

Figure 1 Classes of arrayed oligonucleotides. Each 
position is interrogated with 1 0 separate 25-mer oligo- 
nucleotides, 5 (two wild-type and 3-base substitution) 
each for sense and antisense strands. Substitution 
probes contain each of the four-nucleotide substitu- 
tions 1 3 bases from the 3' end of the oligonucleotide 
(one of these will represent the wild-type sequence). A 
redundant set of wild-type perfect match probes are 
tiled in the lower portion of the array. 
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ray. This redundancy increases assay sensitivity and 
specificity by compensating for nonspecific signals 
that may arise from localized surface imperfections 
such as microscopic scratches in the array surface. 
Although the BRCA1 exon 11 chip design included 
probes complementary to every (1-5) base pair de- 
letion as well as single base pair insertions, the large 
size of the ATM gene (>2.5x larger than BRCA1 
exon 11) precluded the inclusion of additional mu- 
tation-specific probes owing to space limitations on 
the array surface. 

Iterative Strategy for ATM Target Preparation 

A time-limiting and labor-intensive step commonly 
encountered in screening for sequence variations in 
large genes is the nonparallel analysis of individual 
exons. Because gel-based methodologies are depen- 
dent on mutation-based electrophoretic mobility 
shifts, analysis of multiple exons in a single gel lane 
has confounding effects. In addition to problems 
associated with separating multiple exons of similar 
size in one lane, mutant species may comigrate with 
other wild-type exons and thus mask their presence. 
Furthermore, it is generally not feasible to deduce 
the general identity of a mutant species (i.e., from 
which exon it is derived). 

Hybridization-based mutational analysis meth- 
odologies allow parallel analysis of multiple nucleic 
acid species independent of their individual sizes 
(Wang et al. 1998). Therefore, it is possible to am- 
plify separately every coding region of interest, pool 
them, produce RNA target, and hybridize to oligo- 
nucleotide arrays. Although this strategy works well 
when analyzing small numbers of exons, as pool 
sizes increase the most robust PCR/transcription re- 
actions tend to dominate, and weaker amplifica- 
tions are lost. 

To address this issue, we pursued an empirical 
iterative hybridization-based strategy to evaluate 
rapidly and develop multiexon PCR reactions in- 
volving exons of similar sizes that allow for robust 
production of ATM test target samples (Fig. 2). First, 
primer sets containing T3 and T7 RNA polymerase 
promoter sequences, based on those used in previ- 
ous studies (Vorechovsky et al. 1996), were devel- 
oped to amplify all individual coding exons using a 
single PCR protocol (Table 1). Second, all 62 primer 
pairs were pooled into a single multiplex PCR reac- 
tion. Reaction products were used as templates in T3 
andT7 RNA polymerase-mediated in vitro transcrip- 
tion reactions to produce sense and antisense RNA 
targets. Target was hybridized to ATM arrays and 
sense and antisense strand data were averaged to 



Multiplex PCR 
In vitro transcription 
Hybridization 



e5 

•t5 



lot 



Pool primer pairs for 
remaining exons 



Group primer pairs for 
exons with specific 
hybridization signals 



?.-*>. 



1 



Are all exons in a 
working group? 



YES 




finished 

Figure 2 Iterative multiexon PCR and in vitro tran- 
scription optimization strategy based upon DNA chip 
hybridization. 

produce a composite data set. Primer pairs that pro- 
duced target giving >90% base-calling accuracy, 
based on the perfect match probe for a nucleotide 
position having a 1.2-fold greater hybridization sig- 
nal intensity than that of the next highest single 
nucleotide substitution probe in each interrogation 
probe set (Chee et al. 1996; Hacia et al. 1996), were 
placed into a separate ATM target pool A. The re- 
maining primer pairs were subject to another round 
of multiexon in vitro transcription template prepa- 
ration. Second round and pool A multiexon PCR 
reaction products were combined and in vitro tran- 
scribed to produce target. Amplicons giving >90% 
base-calling specificity from the second round pool 
were placed into a separate pool B. The primers for 
the remaining exons underwent 17 additional 
rounds of multiexon target preparation analysis to 
yield 13 pools (A-M) of ATM exon primer pairs 
(Table 2). To further optimize these multiexon 
pools, five PCR primer sites were redesigned to pro- 
duce increased hybridization signals for their re- 
spective exons. In the finished system, separate 
multiexon PCR reactions were carried out, products 
pooled, and in vitro transcribed in a single reaction. 
Several different hybridization conditions were 
then tested to determine which one produced opti- 
mal signal-to-noise ratios based on the aforemen- 
tioned base-calling algorithm. 
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Table 1. ATM Amplification Primers 



Exon 


Forward sequence 


Reverse sequence 


Exon 


Forward sequence 


Keverse sequence 


4 


TTTTTCACACCTCTTTCTCTC 


TAATAATGGGTT ACT AATC AC 




OCAA1 J A J AAnL/wTAAu IUj 


1 n J n IV 1 1 ao J lunL 


5 


TGAAATGTGTGATTAGTAAC 


AAAATAAAGACAGTAAAAT 


3 6 


CAGCATTATAGl 1 TTGA/\AT 




G 


AGTATTCAACGAGTTTCTG 


ATCTGTTAAGCCATTTATT 


3 7 


TGGTGTACTTGA i AGGCATTT 


C C C AC AGC AAAC AG AACTG 


7 


GTTGCCATTCCAACTGTCTTA 


CAAAC AAC AACC TTC AAAAC A 


38 


TTTCTAATCCCTTTCTTTCT 


TAAACAGGTCATAAACAAG 


8 


CTGTATGGGATTATGGAATA 


C AAAAG AAAAAG AG ATT AG AT T 


39 


CA7TTTTACTCAAACTATTG 


TCTTAAATCCATCTTTCTCTA 


9 


TTGAGCTTGTTTGTTTCTTC 


GACrrCTATGTTTGAATGA 


10 


TfiJ ACAAGAAGGAAGAAGGT 


CGT AAG AAGC AAC ACTC ATT 


10 


CTACCAGTGTAAACAGAGTA 


TAGGCTTTTTGTGAGAACA 


4 1 


C AAC ATG CT TTT ATTTTG ATA 


TA'J'AI'ACCCTTATTGAGACAA 


J ] 


G C AAC AA C AGCG AAACTCTG 


ATG AG AAAATGGTAAC ACTT 


42 


GCTGTCTTGACGTTCACAG 


TGAGATAAATACTGTCATAAA 


12 


TGTGATCGAATAGTTTTCAA 


GTAACAAACTATGAAAATGA 


4 3 


AATTTGC T AAA TTT AT AG AC CG 


CCC C AAAAAAAAAAATC AA 


12 


CAATACCTTGC ttttcac AAT 


TGGCATCAAATAAGTGGAGAG 


44 


TTTTCACAATCTTTTCTTAT 


GTCATGGCTTTACCAAATCTGG 


14 


G CTTTTGGTC TTCT AAC TG A 


C AG CT AAAATTATC ATCTTTG 


4 b 


CTGGTTTTCTGTTC-ATATCTTT 


TGTTTAGAATGAGGAGAGAGGCA 


15 


CATATAAGGCAAAGCATTAG 


GTTTACCAAAGTTGAATCATA 


46 


TATCTTAGC-CTTCTCTTTTTA 


GTAACTTTGTCTTTTCATAAT 


16 


TTTTATTTGTGGTTTACTTT 


TCACAGGAATACATTTCATT 


47 


TTCCCTGAAAACCTCTTCTT 


GGATAACAAAGTCATACGA 


17 


TTGCATTTTTCCTTCTATTCA 


CTCCAGCCTGGGTGACAGAGA 


48 


CTCTTCCTTACATGAACTCTA 


AGAGGTAAGATG AC ATAGTT 


16 


CACTGTCTGCCGAGAATAAT 


GCAAAACAGGAAGCATACTT 


49 


TTCCCATATCTCATTTTCAT 


ACACTAATCCAGCCAATAAA 


IS 


CTCCTGCAAGAAGCCATCT 


AGAAATCCCAAGTAGTAAAT 


50 


CCGTACATGAAGGGCAGTTGG 


TTGATGAAAAGATGAAGCATA 


20 


GTTGTGC CCT TCTCTT AGTGTT 


CTC ATTAC ATTT AGTC AGC AA 


b\ 


TTAAATTGGTTGTGTTTTCTT 


CCAAGTCACTCTTTCTATG 


21 


TTTTTCCCTCC T ACC ATCTT 


CTTAACAGAACACATCAGTTATT 


b2 


GTTCATGGCTTTTGTGTTTT 


TAGAATATTGGGCTGAGTAAC 


22 


T AAAATAACTG A TGTGTTCTGTT 


CAAAACTTGCATTCGTATC 


53 


CTTGCTTAGATGTGAGAATA 


GTTTGATTTTCAGGTTTACTT 


23 


TTTGGAAAACTTACTTGATT 


TGGTT AAATATGAAAT AGAG 


54 


AATCTAATAGTTCTTTTCTT 


CTGAATATCACACTTCTAAA 


24 


T CTTTG T TTC* TT AATG AGT A 


t.ALjl-M J I LvAAA I Al. I J t 


J J 




GTAAC AC AGC AAGAAAGTAACGT 


25 


GTTTGTTTGCTTGCTTGTTT 


TTTATGGGATATTCATAGC 


56 


CCTTCAATGCTGTTCCTCAGT 


AGGTTG AAAC ATATG AAA TTTGCC 


26 


TGGAGTTCAGTTGGGATTTTA 


TTC AC AGTC ACCTAAGG AACC 


57 


GC AAATAGTGT ATC TGACCTA 


CTAAAACTCTAAGGGCT AAGCC A 


27 


CTTAACACATTGACTTTTTGG 


GTATGTGTGTTGCTGGTGAG 


58 


TTTGCTATTCTCAGATGACTCT 


TGTTTTGGTGAACTAACAGAAG 


28 


TACTTTAATGCTGATGGTA 


GAATAAATCGAATAAATAGC 


59 


CTGACTCTGATAGCTGAATC 


GCTCTCAGCTTT AAT AAGCC 


29 


GCTGTCTTG ACG TTC AC AG 


TTAAAAAGAGTGATGTCTATAA 


60 


CTGTTAGCTTCTTGTAGG 


CACATCATCACTATCATCCC 


30 


TTAAAACGATGACTCTATT 


ACGAATGTTCTATTATTA 


61 


TAG AAAGAGATGG AATC AGTG 


TCTTGGTAGGCAAACAACATT 


31 


CCGAGTATCTAATTAAACAAG 


CAGGATAGAAAGACTGCTTAT 


62 


TC AAACCTCCT AACTTC ACTG 


TTATTTCCCTCCTTTACTT 


32 


CTTACTGGTTGTTGTTGTTTT 


CCATTTTGAAGATGAGTCAG 


63 


CAGGCTCAGCATACTACACAT 


CGAGATACACAGTCTACCT 


33 


GTTTTGTTGGCTTACTTTA 


GC ATT AC AG ATTTTTGAA 


64 


TGAAACTGCTTCTACTGTT 


AATCTGAAAAACTGACAAC 


34 


GTGTTAAAAGCAAGTTACATT 


AGAAAC AGGT AG AAATAGC 


65 


AATAGAAGGTCCTGTTGTCAGT 


CCCTACTTAAAGTATGTTGGCA 



Two-Color Loss of Hybridization Signal Analysis 

Two-color cohybridization experiments were used 
previously in scanning exon 1 1 of the BRCA1 gene 
for all possible heterozygous sequence changes (Ha- 
cia et al. 1996). Wild-type fluorescein-labeled 
(green) reference and biotinylated [stained with 
phycoerythrin-streptavidin (red) conjugate after ar- 
ray hybridization] test targets were competitively 
cohybridized to arrays and the relative binding of 
both targets to all probes was quantitated. The ratio 
of reference and test target occupancy to each per- 
fect match oligonucleotide probe was used to detect 
sequence variations between samples (Chee et al. 
1996; Hacia et al. 1996). 

Test sample targets containing nonrepetitive se- 
quence variations should have reduced affinity 
(relative to reference wild-type target) toward the 
family of perfect match probes designed to hybrid- 
ize to the nucleotide tract now containing these 
variations (Chee et al. 1996; Hacia et al. 1996). This 
results in relative localized losses of a hybridization 
signal that can be displayed by plotting the ratio of 
reference to test sample hybridization signals for all 



overlapping perfect match oligonucleotide probes. 
A peak of perfect match probe hybridization signal 
intensity ratios should be centered nearby mutant 



Table 2. ATM Coding Exon Multiplex 


PCR groups 




Multiexon 




group 


Exons amplified 


A 


1 3, 1 8, 26, 31 , 32, 34, 40, 45, 52 


B 


37, 44, 50, 55, 57, 58, 61, 63 


C 


7, 14, 20, 29, 47, 56, 60, 62 


D 


9, 11, 15, 22 


E 


4, 17, 21, 23 


F 


35, 36, 39, 41, 42, 46, 54, 64 


C 


16, 48, 49, 51, 53, 65 


H 


5, 6, 25, 30 


1 


8, 38 


] 


10, 28, 59 


K 


19, 27 


L 


12, 43 


M 


24, 33 
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sequences as a result of the localized relative de- 
crease in mutant test sample hybridization. In prac- 
tice, the baseline reference/test perfect match probe 
signal ratio values (which we will refer to as "loss of 
hybridization signal" baselines) obtained by com- 
paring hybridization intensities on a single chip 
generally fluctuate between 0.5 and 2.5. However, 
fluctuations in the reference/test hybridization sig- 
nal ratios for each perfect match probe can be nor- 
malized against data obtained from similar experi- 
ments because of reproducible two-color cohybrid- 
ization results and probe redundancy (Hacia et al. 
1996). 

For convenience in visualizing relative losses of 
hybridization signal in test samples, it is useful for 
the average baseline of the reference/test perfect 
match probe hybridization signals for each of the 62 
individual ATM coding exons to have a value of 
one. However, this is not always the case with un- 
processed data, because of subtle variations in target 
concentration (in part due to a variability in mul- 
tiexon PCR product yields), hybridization condi- 
tions, or in the arrays themselves. Therefore, it is 
necessary to calculate and apply a specific correc- 
tion factor tailored for each of the individual coding 
exons to produce an average baseline value of one in 
each instance. 

After using this correction factor we still ob- 
served that certain perfect match probe sets tended 
to have larger deviations from loss of hybridization 
signal baseline values (above or below the ideal 
value of one) than the majority of the data set. We 
reasoned that in some circumstances target hybrid- 
ization could be significantly more sensitive to 
probe sequence composition and structure than 
others. Although reference sample concentration is 
equivalent in each hybridization reaction (as it is 
produced in a single large batch and aliquoted into 
each reaction), test sample concentrations may vary 
slightly because they are individually prepared. 
Concentration differences are likely attributable to 
subtle changes in the multiexon PCR reaction con- 
ditions that may cause specific exons within each 
multiexon PCR reaction pool to be amplified at dif- 
ferent efficiency. Based on these considerations, we 
searched for and analyzed patterns in probe se- 
quence composition and potential secondary struc- 
tures to predict the relative likelihood and extent of 
any given probe to provide a lower signal-to-noise 
ratio. Using these predictions we formulated multi- 
plicative correction factors to help minimize hybrid- 
ization signal baseline fluctuations (see Methods). 

A third general type of correction was used to 
minimize the effects of nonreproducible phenom- 



ena, such as microscopic scratches on the chip sur- 
face. Two steps were taken toward this end. The first 
was to truncate the corrected ratios to 1.50 and 0.67 
for values >1.5 and <0.67, respectively. The second 
was to discard potential outlier values. This was ef- 
fected by taking a moving window average of the 
corrected and truncated loss of hybridization signal 
ratio values over nine consecutive overlapping 
probe positions. The highest and lowest values 
within a window are discarded before the average 
within the window is calculated from data from 
each strand. 

When incorporating these correction factors, 
stable baseline values generally between 1.0 ± 0.15 
can be obtained for ratios of reference and test per- 
fect match probe hybridization signals interrogating 
nucleotide tracts that are identical between refer- 
ence and test samples. Ideal heterozygous muta- 
tions produce peaks with a maximum magnitude of 
1.5, reflecting the maximum value allowed by the 
signal-processing algorithm. Cross-hybridization of 
the mutant allele to the wild-type probe will reduce 
this ratio closer to 1 .0. The width and shape of these 
peaks are primarily a function of the sequence 
change and probe length. In arrays consisting of 
probes 25 nucleotides in length, point mutations 
should produce a peak width of 25 nucleotides, 
whereas deletions and insertions of x bp should pro- 
duce peaks -(25 + x) bp in width. Although these 
theoretical properties are not always consistent with 
empirical observations, because of sequence context 
effects, peaks of widths <21 bp are considered to 
result from background noise rather than the pres- 
ence of a test sample sequence variation. 

Two-Color Loss of Signal Analysis 

Two-color loss of signal analysis displaying the 5932 
G T ATM nonsense mutation in sample 
GM08388 is given in Figure 3. The corrected ratio of 
reference (green) to test (red) perfect match probe 
hybridization signals averaged for sense and anti- 
sense strand target data are plotted for all ATM- 
coding nucleotide positions. Only a single peak is of 
sufficient magnitude and peak width in this analysis 
to indicate a sequence change. The beginning of an 
intronic loss of signal peak several nucleotides pre- 
ceding exon 35 is not scored, as the only intronic 
sequences we are considering in these experiments 
are the immediate 3' AG acceptor and 5' GT donor 
dinucleotides. Perfect match probe tilings extend 10 
bp into adjacent intronic sequence to allow accurate 
loss of signal analysis for these splice junction di- 
nucleotides. Analyzing sequences further into in- 
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Figure 3 Two-color loss of signal assay for a nonsense mutation. Fluorescein- 
labeled (green) reference and biotinylated (red) test CM08388 targets were 
cohybridized to the array. To correct for reproducible differences in the hybrid- 
ization efficiencies of reference and test targets, the ratio of fluorescein to phy- 
coerythrin signal at each wild-type position was normalized against ratios derived 
from 10 separate chip cohybridization experiments as described in Methods. 
Averaged sense and antisense strand ratios from CM08388 are shown with the 
identity of each exon listed below the appropriate data points. The labeled peak 
at the mutated 5932 G/T position (a nonsense mutation) is present on both 
strands. 



tronic regions is more challenging because of the 
repetitive polypyrimidine and polypurine tracts 
that accompany splice junctions and may confound 
hybridization-based sequence analysis (Hacia et al. 
1996). The increased level of polymorphism in these 
noncoding regions also increases the difficulty of 
hybridization-based sequence analysis and data in- 
terpretation. 

Blinded Analysis of ATM Samples 

To ascertain the specificity and sensitivity of two- 
color hybridization analysis, blinded ATM muta- 
tional analysis was performed on genomic DNA 
samples. Twenty-three samples were screened for se- 
quence variations in the ATM-coding region (in- 
cluding the 3' AG and 5' GT acceptor and donor 
sites for each exon). These included ATM homozy- 
gotes, compound heterozygotes, and heterozygous 
carriers of AT mutations. 

Sequence variations were scored relying solely 
on loss of hybridization signal data. Averaged sense 
and antisense strand (composite) data were first 
considered. Composite loss of hybridization peaks 



of maximum height 1.3 or 
greater were flagged immedi- 
ately for further dideoxy se- 
quencing analysis, unless 
found in an exon of unaccept- 
able baseline fluctuation. In- 
dividual sense and antisense 
strand loss of signal data were 
considered for averaged com- 
posite peaks of magnitudes 
between 1.2 and 1.3. Only if 
the sense and antisense strand 
loss of signal peaks were of 
similar shape (not necessarily 
of the same magnitude) and 
in phase with respect to one 
another were these sequences 
flagged for further analysis. 

Of 18 distinct heterozy- 
gous changes confirmed to 
occur in the ATM coding re- 
gions, 1 7 were detected using 
the ATM chip assay (Table 3). 
The 6015insC mutation was 
the only known heterozygous 
change not detected at least 
once in the assay. In addition, 
all eight known homozygous 
sequence changes were de- 
tected. Of the 18 distinct het- 
erozygous sequence changes confirmed to occur in 
the assayed ATM sequence, 8 were present in the 
heterozygous state in more than one sample be- 
cause of the relatedness of the individuals from 
which the genomic DNA was obtained. In two of 
these cases (8266 A -» T and 1141insGACA) the mu- 
tation was identified correctly in one sample but 
not in another. Interestingly in both these cases, 
changes were missed owing to a failure of the chip 
designed to interrogate sense strand sequence to de- 
tect the change. This was mainly due to baseline 
noise masking weak two-color loss of hybridization 
peaks. For this ATM chip design it may be necessary 
to repeat hybridization experiments for samples 
with marginal loss of signal peaks occurring in re- 
gions of baseline fluctuation to increase assay sen- 
sitivity. Five false-positive sequence tracts were 
flagged for confirmatory dideoxy sequencing analy- 
sis based solely on the loss of signal assay. Of 1240 
exons interrogated, seven (exons 5 and 12 twice, as 
well as exons 19, 26, and 27 each one time only) 
provided baseline values too variable for accurate 
loss of signal analysis, corresponding to >99.4% PCR 
amplification success rate. 
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Gain of Signal Analysis 

In this ATM array design, gain of signal analysis is 
limited to the detection of single nucleotide substi- 
tutions. The simple base-calling algorithm described 
previously (Chee et al. 1996; Hacia et al. 1996) pro- 
vides a useful metric by which to evaluate the speci- 
ficitv of these probes. When combining sense and 
antisense data derived from biotinylated reference 
target, 98.5% of the ^TM-coding nucleotides are 
called correctly. Many of the miscalls occurred in 
regions of high A/T content where weak hybridiza- 
tion signals were presumably the result of the de- 
creased stability of probe/target interactions. To in- 
crease the base-calling efficiency in these regions, 
target was transcribed in the presence of 5-methy- 
luridine triphosphate in place of uridine triphos- 
phate. Incorporation of 5-methyluridine into target 
has been shown previously to increase hybridiza- 
tion of A/U-rich regions of BRCA1 (Hacia et al. 
1998c). ] n theory this is due to the increased ther- 
modynamic stability of A/T relative to A/U base 
pairs (Saenger 1984). ATM targets containing 
5-methyluridine substitutions gave a composite 
base-calling efficiency of 99.1%. Although not used 
in loss of signal analysis in this current study, it 
appears likely that the modified targets will prove 
useful in mutational analysis. 

Nucleotide substitution probes showed promis- 
ing sensitivity in detecting heterozygous single 
nucleotide substitutions on retrospective analysis of 
the blinded ATM sample study. Figure 4 displays a 
false colored image of the entire ATM chip fluores- 
cent hybridization patterns produced from anti- 
sense test GM08388 target. Many of the darker re- 
gions represent array probes interrogating several 
bases further into intronic regions of high A/T- 
content, and could be rescued partially by 5-methy- 
luridine incorporation. The region of the array in- 
terrogating nucleotide position 5932 in samples 
GM11261 (5932 G/G) and GM08388 (5932 G/T) is 
shown in Figure 4, b and c. The only significant 
difference between the hybridization patterns is the 
increased fluorescent signal at the 5932 T probe in 
sample GM08388, indicating the presence of the 
5932 T allele. The wild-type 5932 G allele is also 
detected in GM08388. Therefore, chip hybridiza- 
tion analysis successfully genotypes GM08388 as a 
5932 G/T heterozygote. 

Although gain of signal analysis was not used 
directly in mutational analysis because of the ab- 
sence of probes interrogating for insertions and de- 
letions, single nucleotide substitution probes were 
evaluated for their sensitivity toward detecting 





TCTTGCATTTfiAAGAAGGAAG 
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Figure 4 Chip image comparisons for a heterozy- 
gous nucleotide substitution, (a) False colored image 
of GM08388 target hybridization to an ATM sense ar- 
ray (1.2 x 1.2 cm) is given. Probes with greatest hy- 
bridization signal are given in white and red; those of 
lowest signal are given in blue and black. A magnified 
view of the probe sets (50 um feature size) interrogat- 
ing nucleotide position 5932 in samples GM11261 
(5932 G/G) and GM08388 (5932 G/T) is shown in b 
and c, respectively. The identity of the interrogated 
ATM cDNA nucleotide positions 5922-5942 is given 
with IUPAC ambiguity codes and position 5932 is un- 
derlined. 



likely mutations (Table 3) or neutral variants that do 
not affect the function of the ATM gene (Table 4). 
Applying the simple base-calling algorithm de- 
scribed for the iterative multiexon PCR strategy, we 
assayed whether the gain of signal probes could be 
used to detect single nucleotide changes. In 5 of 17 
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Table 4. Confirmed Newly Identified Probable Neutral Sequence Variations Detected by DNA 
Chip Analysis 
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Loss of signal 3 


Gain of signal b 
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sequence 


Protein 












SamDle 


Status 


Relationships 


change 


chanqe 


State 


coding 


nc c 


coding 


nc c 


GM09585 


carrier 


father of 
GM09587B 


735 C~>T 


V245V 


het 


++ 


++ 


+ 
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GM03188A 


carrier 


mother of 
GM03189C 


4578 C->T 


P1526P 


horn 


++ 
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+ 


GM02781B 


carrier 


mother of 
GM02782B 


5558 A-*T 


D1853V 


het 
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++ 
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+ 




^ff e*ct pH 


off snrinn of 
GM09585 
and 

GM09588A 


735 C— >T 


V245V 


het 


++ 


t-+ 
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GM03189C d 


affected 


offspring of 
GM03187A 
and 

GM03188A 


4578 G->T 


P1526P 


het 


+/- 


++ 






(AT3U\) 


affected 


proband 


1254 A->G 


Q418Q 


het 


++ 


++ 
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GM11261 d 




















NA11254 d 


affected 


proband 


4578 C->T 


P1526P 


het 


++ 


++ 







a (++) Peak height 5= 1.3; (+/-) peak height 1.2 and < 1.3; (-) peak height < 1.2. 

b (+) Allele-specific probe signal > 1.2x wild-type probe; (-) allele-specific probe signal < 1.2x wild-type probe. 
c (nc) Noncoding strand. 
d Wright et at. (1996). 



heterozygous single nucleotide substitutions, the 
gain of signal probes for the mutant allele did not 
show hybridization signal for either strand within 
1.2-fold intensity of the wild-type probe. The fact 
that these base substitutions could not be detected 
demonstrates that single nucleotide substitution 
probes lack the predictable hybridization properties 
necessary for high sensitivity mutation screening 
purposes. On the other hand, the loss of hybridiza- 
tion signal assay clearly identified these substitu- 
tions missed by the gain of hybridization signal as- 
say. Nevertheless, in one case the gain of hybridiza- 
tion signal probes could be used to supplement the 
loss of signal analysis. This involved the identifica- 
tion of the 8266 A — > T base substitution in 
GM03188A that was missed by the loss of signal 
assay attributable to unstable baseline values in that 
particular analysis. Interestingly this mutation was 
detected in sample GM03189C using the loss of hy- 
bridization signal assay. 

DISCUSSION 

Of the 7 carriers and 15 affecteds in the blinded 



ATM sequence analysis, there is a theoretical total of 
37 mutant ATM alleles, assuming one mutant allele 
per carrier and two per patient sample (Table 3). 
However, sample GM02782B is reported to contain 
three separate cDNA mutations (Wright et al. 1996), 
which could be the result of three separate muta- 
tions (which would bring the total up to 38 mutant 
alleles) or complex alternative splicing events. We 
choose to use the value of 38 mutant alleles of 
which 28 (found either in the homozygous or het- 
erozygous states) were previously known for these 
samples (Gilad et al. 1996a,b; Wright et al. 1996). 
An additional seven alleles likely to cause the ATM 
phenotype were discovered using the ATM DNA 
chip assay (Tables 3 and 4). Three of these involved 
heterozygous base substitutions, 8585TG AA and 
9022 C — > T (found twice), leading to nonconserva- 
tive amino acid changes. In fact, the 8585TG -» AA 
sequence variation alters two resides in the highly 
conserved PI 3-kinase domain. Three other nucleo- 
tide substitutions, 4852 C — > T and 5932 G -» T 
(found twice), involved nonsense mutations. Two 
other changes, 1141insGACA (found twice), in- 
volved a 4-bp insertion that was detected in sample 
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GM03189C but not in GM03187A, the father of 
GM03189C. The mutant allele in sample 
GM03187A was not known previously; however, 
because sample GM03189C was found to contain 
the 1 141insGACA allele, GM03187A was subject to 
dideoxy sequencing analysis and found to be a car- 
rier. 

The DNA chip-based hybridization analysis 
helped to elucidate the genomic basis for two dis- 
tinct mutations previously described in patient cD- 
NAs. Loss of hybridization signal analysis indicated 
a possible 3' splice junction sequence change in 
sample GM02782B reported to contain the 
2251dell9 cDNA mutation (Wright et al. 1996). 
Dideoxy sequencing analysis of subcloned PCR 
products indicated the presence of the heterozygous 
2251-1 G A genomic DNA base substitution, 
which alters the 3' acceptor splice site. Furthermore, 
the genomic DNA change responsible for the com- 
plete lack of exon 32 in sample AT29RM ATM cDNA 
has not been reported (Gilad et al. 1996b). Exon 32 
from sample AT29RM failed to amplify using stan- 
dard primer sets, as first indicated by lack of hybrid- 
ization signal at exon 32 specific probes and con- 
firmed by the absence of PCR product when this 
exon was amplified and analyzed individually by 
agarose gel electrophoresis. Another intronic primer 
pair (31A 5'-TTTCAGAGTAATTTTCCAGAAC-3' and 
31B 5'-CACTCAAATCCTTCTAACAATACT-3') was 
used to amplify exon 32 from sample AT29RM. The 
PCR product was subcloned and individual colonies 
sequenced to find that it contained a homozygous 
10-bp deletion (4611delG + 9) that extends 9 bases 
into the intronic region and alters the 5' donor 
splice junction sequence. 

In addition to finding these new mutations and 
sequence variations, the DNA chip assay helped to 
correct other genotyping assignments as well. For 
example, the 7517del4 frameshift mutation found 
in AT patient sample AT22RM, previously reported 
as being in the homozygous state (Gilad et al. 
1996b), was found to be in the heterozygous state 
by dideoxy sequencing analysis. The initial assign- 
ment was questioned because the previously unre- 
ported 4852C — > T nonsense mutation was found in 
the same sample. The researchers indicated that the 
homozygote genotype could be a misassignment as 
it was based on RNA sequence analysis where mes- 
sage derived from one mutant allele in a compound 
heterozygote could be unstable and thus not scored 
(Gilad et ah 1996b). This highlights an inherent 
drawback in using RNA for mutational analysis as 
sequence changes can be missed if they decrease 
sufficiently the lifetime of the mutant RNA. 



Currently 36 of 38 possible mutant alleles in 
these samples are known with the remaining 2 al- 
leles (one from the carrier GM02781B, with 
D1853V being a probable polymorphism as it is not 
found in the offspring AT patient GM02782B, and 
one from the affected GM 11254) as yet undiscov- 
ered. These mutations could be the result of the lack 
of sensitivity of both the DNA chip and alternative 
assays toward detecting these mutations. However, 
it is also possible that other ATM mutations may 
reside further in intronic or the promoter region. In 
addition, large-scale genomic deletions could be 
present that would confound any PCR-based assay. 
Of the 36 known ATM mutant alleles, 34 occur 
within the genomic regions interrogated by the 
DNA chip-based assay. A total of 31 of 34 of these 
mutant alleles, counting each as many times as it is 
found in the sample set, were detected using the 
ATM chip to give an overall 91% sensitivity. 

A number of factors may confound hybridiza- 
tion analysis and lead to false-positive and false- 
negative mutation detection assignments. Length 
differences among reference and test RNA species as 
well as internal dye or hapten incorporation may 
affect assay sensitivity and specificity. Hybridization 
will be affected by intra- or intermolecular struc- 
tures caused by test target sequence variations in 
addition to the thermodynamic properties of mis- 
matched target/probe duplex formation. Repetitive 
nucleotide tracts and duplications have been recog- 
nized as posing significant challenges to hybridiza- 
tion based assays (Hacia et al. 1996). The potential 
for cross-hybridization attributable to the formation 
of stable partial target/probe duplexes caused by 
probe slippage is increased in these sequence con- 
texts. In these cases, the variant target sequence 
may still bind to wild-type probes and minimize the 
sensitivity of the loss of signal assay. Increasingly 
sophisticated hybridization data analysis algorithms 
will have to be developed to compensate for poten- 
tial losses of mutation detection specificity in these 
sequence contexts. 

The false-positive mutation detection rate (five 
identified in >200 kb scanned) in the blinded ATM 
mutational analysis study was higher than that 
found in a previous nonblinded two-color DNA 
chip-based analysis of BRCA1 exon 11 samples 
(none identified in >120 kb scanned). This is prima- 
rily attributable to the lower number of mutation- 
specific probes per target nucleotide in the ATM 
relative to BRCA1 exon 11 arrays, where single base- 
pair insertion as well as (l-5)-bp deletion gain of 
signal probes were represented. It was not possible 
in the absence of insertion/deletion probes to elimi- 
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nate ATM loss of signal peaks with marginal widths 
from consideration, because of a lack of confirma- 
tory evidence from gain of signal probes specifically 
interrogating these sequence changes. Nevertheless, 
such false positives carry relative little risk, as their 
number was relatively few and confirmatory 
dideoxy sequencing was always done before calling 
a mutation. 

We have demonstrated the efficient use of oli- 
gonucleotide arrays to screen for heterozygous and 
homozygous sequence variations in the large mul- 
tiexon ATM gene. The ability of hybridization-based 
assays to analyze amplicons of identical sizes sim- 
plifies multiexon PCR amplification reactions. Al- 
though this limits the resolution of independent 
measurement of target quality, such as simple aga- 
rose gel electrophoresis analysis of PCR reactions, 
highly robust multiexon PCR reactions can, none- 
theless, be developed in a rapid time frame (Wang et 
al. 1998). Encouraging results obtained from 
blinded ATM mutational analysis and advances in 
high-density oligonucleotide array manufacturing 
has now prompted the design of a second genera- 
tion pair of ATM chips, designed to interrogate 
sense and antisense strand sequence, collectively 
containing >260,000 probes. Future ATM array de- 
signs with greater perfect match probe redundancy 
and additional mutation-specific oligonucleotide 
probes, such as those representing every possible 
single nucleotide insertion and small deletions 
should enhance the reproducibility, sensitivity, and 
specificity of the DNA chip-based assay. Further- 
more, including additional wild-type probes based 
on common ATM polymorphic allele sequences will 
optimize mutation detection in regions surround- 
ing these nucleotide changes. These chips could be 
used to help resolve the controversy surrounding 
cancer risks to ATM carriers, as well as in screening 
for sequence variations in the ATM gene for several 
AT- like diseases (Gilad et al. 1998a). Because DNA 
chip technology has the potential to analyze virtu- 
ally any gene for heterozygous sequence variations, 
the hybridization-based strategies used in this study 
should be applicable to mutational analysis in many 
other systems. 

METHODS 

PCR From Genomic DNA and RNA Target 
Preparation 

PCR reactions were performed on genomic DNA isolated from 
human AT fibroblast or lymphoblast cell lines, obtained from 
the Coriell Institute or as a gift from Yossi Shiloh (Tel Aviv 
University, Israel), using the AmpliTaq Gold (Perkin-Elmer) 



PCR kit and the manufacturer's recommended protocols. The 
primer pairs used are listed in Table 1 and contain T3 (5'- 
ATT A A C C CT A CT A A A G G A - 3 ' ) and T7 (S'-TAATACGACT- 
CACTATAGGGA-3*) promoter sequences for forward and re- 
verse primers, respectively. Genomic DNA from reference and 
test samples were subject to the multiexon PCR reactions 
listed in Table 2. Reference and test sample multiexon PCR 
reaction pools were combined separately and subjected to in 
vitro tran.-cription reactions. These reactions were performed 
in 10-uI reaction volumes using T3 RNA polymerase transcrip- 
tion buffer (Promega), 0.7 mM of ATP, CTP, GTP, and UTP [or 
5-methyl-UTP (Hacia et al. 1998c)], 10 mM DTT, 0.7 mM fluo- 
rescein-12-UTP or 0.15 m\i biotin-1 6-UTP (Boehringer Mann- 
heim) for reference and test samples, respectively, and 10 
units of T3 or T7 RNA polymerase as indicated. To generate 
and optimize multiexon PCR reactions, an iterative hybrid- 
ization-based strategy was used (Fig. 3). Target was generated 
from multiexon PCR reactions through in vitro transcription 
reactions using biotin-UTP. 

Reference and test sample in vitro transcription products 
were diluted into a 25-uI solution of 30 mM MgCl 2 and incu- 
bated at 94°C for 15 min to fragment targets (Hacia et al. 
1996). Cofragmented targets were diluted 1:100 into a 300-ul 
volume of hybridization buffer [3 m TMAC-C1 (tetramethyl- 
ammonium chloride), 1 x TE (pH 7.4), 0.001% Triton X-100, 
1 nM 5'-fluorescein-labeled control oligonucleotide 5'- 
CG GTAGCATCTTG AC-3 ' (designed to hybridize to specific 
array probes to aid in image alignment)! and passed through 
a 0.22-um syringe filter. 

Chip Hybridization and Data Analysis 

Target was hybridized to either the sense or antisense ATM 
arrays in a 250-ul volume for 4 hr at 42°C. The chip surface 
was washed with 10 ml of wash buffer (6x SSPE, 0.001% 
Triton X-100) and stained with phycoerythrin-streptavidin 
conjugate (Molecular Probes) (2 ug/ml in wash buffer) for 5 
min at room temperature. The chip was washed with 5 ml of 
wash buffer and data were accumulated using a scanning con- 
focal microscope equipped with a 488-nm argon laser [Gene- 
Chip Scanner (Affymetrix)]. After passing through 515- to 
545-nm (green) band-pass and 560-nm (red) long-pass emis- 
sion fitters, fluorescent hybridization signals were detected 
using a photomultiplier tube. 

GeneChip Software (Affymetrix) was used to produce 
digitized images of fluorescent target hybridized to arrays by 
converting photomultiplier tube output into spatially ad- 
dressed pixel values. Probe signal intensities are calculated 
from the mean of the non-outlier photon counts for each 
feature (i.e., per probe). The contributions of reference and 
test targets to each probe hybridization signal were extracted 
from each set of green reference and red test images using 
custom software. 



Sequence Normalization Algorithms 

To correct for reproducible fluctuations in the ratio of test and 
reference perfect match probe hybridization signals present 
after normalization against data generated from separate two- 
color cohybridization experiments, we calculated 79 quanti- 
ties for each perfect match probe that form the basis for a first 
round of signal normalization. Of these, 62 are specific for 
probes in a given exon (representing individual multiplicative 



GENOME RESEARCH w# 1 25 5 



HACIA ET AL. 



correction factors for each of the 62 ATM coding exons) and 
are set to a value of one for the exon containing a probe and 
to zero for all other exons. Two additional quantities reflect 
multiplicative intra- and intermolecular probe structure nor- 
malization scores. These attempt to normalize for inter- and 
intramolecular array probe hybridization and secondary 
structures that may also contribute to fluctuations in perfect 
match probe signal ratios. 

We developed an algorithm to predict the potential for 
duplex formation between adjacent probes (intermolecular 
structure) within a feature in the array (each feature contains 
-10 7 probes of identical sequence) as well as within a given 
probe (intramolecular structure). The inter- and intramolecu- 
lar probe structure predictions differ in that complementary 
nucleotides less than five positions apart are only considered 
in intramolecular probe structure analysis. This reflects the 
need for a loop of at least four or more residues to be present 
in a stable intramolecular hairpin structure (Zucker 1994). 

The intermolecular probe structure normalization score is 
written as 



''inter = 2*5,, S ; ) • run • exp 

- [; - t,V/2>/width] 2 



- [if - Af/2t/widthr 



■ exp 



where the sum is over the positions in the probe sequence 
(positions 1 to 25 for a 25-mer probe); F(S„ S,) was set to 3 if 
the ith and ;th positions form a GC base pair, 2 if they form an 
AT base pair, and 0 otherwise. These arbitrary values only 
serve as rough estimates of base-pair stability; more sophisti- 
cated algorithms using more detailed thermodynamic rela- 
tionships in theory could be used to further refine this cor- 
rection factor (Zucker 1994). 

The intramolecular probe structure normalization score 

is similar: 

P,n„. = 2 W„ 5,) • run • exp^"-"' 2 "-"'"" 2 



* exp 



-Ji/-.\V2i/width] 2 



The 15 quantities related to perfect match probe ratio 
correction are derived from the probe sequence composition. 
We address this issue by systematically searching for patterns 
in baseline fluctuations that appear to act as a function of 
nucleotide identity at a particular position. For example, con- 
sider a hypothetical situation where the average of the perfect 
match probe ratios is one for all probes except those having 
an A in the fifth position where the average ratio is three. By 
dividing the perfect match probe signal ratios generated these 
probes by three, one can correct for systematic variation in 
these intensity ratios. A more complete way of formulating 
the multiplicative correction factor is to take one type of base 
(e.g., T) as a reference, and have a correction factor for each of 
the other three bases at each probe position. For a 25- 
nucleotide probe, this results in a scheme that involves 75 
variables (three correction factor for each of the 25 nucleo- 
tides in the probe). To simplify this process, we have chosen 
a function with five terms (a fourth degree polynomial func- 
tion of nucleotide position within a probe sequence fitted to 
estimate the function of 75 variables) to describe and correct 
for probe sequence composition effects. For example, if T is 
(arbitrarily) chosen to be the reference base type, we calculate 
C Y , / * 2 ( - . i. N&|X f seq(/)l L v _ „ O'/JV), where X is A, C, or G; / 
is 1, 2, 3, 4, or 5; N is the number of bases in the probe 



sequence; b{X,Y) is one if Y is the same as X and zero other- 
wise; seq(/) is the identity of the ith base in the probe; and 
L k (x) is the kih Legendre polynomial. The first five Legendre 
polynomials are given by LO{x) = 1; Ll(*) = *; L2{x) = 3/ 
2*2* - 1/2; /.3(a) = 5/2*3 - 3/2*; and L4(*) = 3/8*4 + 35/ 
8*2 - 15/4, 

If we group these 79 quantities together into a 79- 
dimensional vector, denoted C, (where i runs from 1 to 79), 
then our correction scheme starts with finding the best least- 
squares f t to the logs of the measured perfect match intensity 
ratios. We need to find the values of the 79 parameters X,- that 
minimize the quantity X prohC5 (log K mca5urcd - log 
K fi <) 2 = probes < lo 8 >C>»red " P/ - i . 79 C,X,}) 2 , where the sum 
over probes is taken over all perfect match probes; K mcasurcd is 
the measured value of the intensity ratio of ratios; and K fit is 
a fit to the measured ratios. The C, values are different for each 
probe sequence; the A',- values are the same for all probes. The 
least-squares parameters X, are found by solving the normal 
equations by Cholesky decomposition (Branham 1990). Then 
the corrected ratio for each probe, K c „ rrcc tcd/ *$ given by the 
residual to the fit: log K cnrrc . ct od = »°g ^measured - lo g ^nt- 

After this signal normalization process, reference/test 
sample perfect match probe hybridization signal ratios are 
truncated to values between 1.5 and 0.67, as described in the 
text. Finally, corrected test/reference perfect match probe hy- 
bridization signal ratios were averaged against those gener- 
ated from individual comparisons to data sets generated from 
10 separate ATM chip hybridization experiments. This pro- 
vides a final set of multiplicative correction factors to mini- 
mize systematic fluctuations in hybridization signal ratios. 
Afterward the reference/test perfect match probe signal ratios 
are plotted against their respective nucleotide positions using 
Microsoft Excel 7.0. Loss of hybridization peaks are scored 
based on peak height and width as described in the text. 



Dideoxy Sequencing Analysis 

PCR primer pairs (Table 1) containing M13 forward (5'- 
GTTTTCCC AGTCACG ACG-3') and reverse (5'- 
AGGAAACAGCTATGACCAT-3') sequences were designed to 
amplify each of the 62 ATM coding exons. Individual exons 
were amplified using the AmpliTaq Gold System (Perkin 
Elmer) using the manufacturer's recommended protocol. Dye 
primer dideoxy sequencing reactions were performed using 
the DYEnamic Energy Transfer Dye Primer Sequencing Kit 
(Amersham Life Science) with the suggested protocol and sup- 
plied Ml 3 forward or M13 reverse primers. 

In cases where the dye primer sequencing strategy did 
not definitively indicate the presence of a heterozygous se- 
quence variation in a test sample, a subcloning strategy was 
used. PCR product of interest was subcloned using Zero Blunt 
Cloning Kit (Invitrogen) and inserts from individual colonies 
were sequenced using dye terminator chemistry. 
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